智能论文笔记

LC-FDNet: Learned Lossless Image Compression with Frequency Decomposition Network

Hochang Rhee , Yeong Il Jang , Seyun Kim , Nam Ik Cho

分类：计算机视觉

2021-12-13

最近基于学习的无损图像压缩方法在子图像单元中编码图像，并实现传统的非学习算法的可比性。然而，这些方法不考虑高频区域中的性能下降，给出低频区域的相同考虑。在本文中，我们提出了一种新的无损图像压缩方法，其以粗略的方式进行编码，以不同地分离和处理低频区域。我们最初压缩低频分量，然后将它们用作额外的输入来编码剩余的高频区域。在这种情况下，低频分量在此情况下发挥作用，这导致高频区域的估计改善。此外，我们设计频率分解过程，以适应颜色通道，空间位置和图像特征。结果，我们的方法导出了低/高频分量的图像特异性最佳比率。实验表明，该方法实现了基准高分辨率数据集的最先进的性能。

translated by 谷歌翻译

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

MEDS-Net: Self-Distilled Multi-Encoders Network with Bi-Direction Maximum Intensity projections for Lung Nodule Detection

Muhammad Usman , Azka Rehman , Abdullah Shahid , Siddique Latif , Shi Sub Byon , Byoung Dai Lee , Sung Hyun Kim , Byung il Lee , Yeong Gil Shin

分类：计算机视觉

2022-10-30

In this study, we propose a lung nodule detection scheme which fully incorporates the clinic workflow of radiologists. Particularly, we exploit Bi-Directional Maximum intensity projection (MIP) images of various thicknesses (i.e., 3, 5 and 10mm) along with a 3D patch of CT scan, consisting of 10 adjacent slices to feed into self-distillation-based Multi-Encoders Network (MEDS-Net). The proposed architecture first condenses 3D patch input to three channels by using a dense block which consists of dense units which effectively examine the nodule presence from 2D axial slices. This condensed information, along with the forward and backward MIP images, is fed to three different encoders to learn the most meaningful representation, which is forwarded into the decoded block at various levels. At the decoder block, we employ a self-distillation mechanism by connecting the distillation block, which contains five lung nodule detectors. It helps to expedite the convergence and improves the learning ability of the proposed architecture. Finally, the proposed scheme reduces the false positives by complementing the main detector with auxiliary detectors. The proposed scheme has been rigorously evaluated on 888 scans of LUNA16 dataset and obtained a CPM score of 93.6\%. The results demonstrate that incorporating of bi-direction MIP images enables MEDS-Net to effectively distinguish nodules from surroundings which help to achieve the sensitivity of 91.5% and 92.8% with false positives rate of 0.25 and 0.5 per scan, respectively.

translated by 谷歌翻译

AW-Opt: Learning Robotic Skills with Imitation and Reinforcement at Scale

Yao Lu , Karol Hausman , Yevgen Chebotar , Mengyuan Yan , Eric Jang , Alexander Herzog , Ted Xiao , Alex Irpan , Mohi Khansari , Dmitry Kalashnikov

分类：机器人

2021-11-09

通过模仿学习（IL）使用用户提供的演示，或者通过使用大量的自主收集的体验来学习机器人技能。方法具有互补的经验和缺点：RL可以达到高度的性能，但需要缺陷，但是需要缺乏要求，但是需要达到高水平的性能，但需要达到高度的性能这可能非常耗时和不安全; IL不要求Xploration，但只学习与所提供的示范一样好的技能。一种方法将两种方法的优势结合在一起？一系列的方法旨在解决这个问题，提出了整合IL和RL的元素的各种技术。然而，扩大了这种方法，这些方法复杂的机器人技能，整合了不同的离线数据，概括到现实世界的情景仍然存在重大挑战。在本文中，USAIM是测试先前IL + RL算法的可扩展性，并设计了一种系统的详细实验实验，这些实验结合了现有的组件，其具有效果有效和可扩展的方式。为此，我们展示了一系列关于了解每个设计决定的影响的一系列实验，以便开发可以利用示范和异构的先前数据在一系列现实世界和现实的模拟问题上获得最佳表现的批准方法。我们通过致电Wap-opt的完整方法将优势加权回归[1,2]和QT-opt [3]结合在一起，提供了一个UnifiedAgveach，用于集成机器人操作的演示和离线数据。请参阅HTTPS： //awopt.github.io有关更多详细信息。

translated by 谷歌翻译

Investigation of Network Architecture for Multimodal Head-and-Neck Tumor Segmentation

Ye Li , Junyu Chen , Se-in Jang , Kuang Gong , Quanzheng Li

分类：计算机视觉

2022-12-21

Inspired by the recent success of Transformers for Natural Language Processing and vision Transformer for Computer Vision, many researchers in the medical imaging community have flocked to Transformer-based networks for various main stream medical tasks such as classification, segmentation, and estimation. In this study, we analyze, two recently published Transformer-based network architectures for the task of multimodal head-and-tumor segmentation and compare their performance to the de facto standard 3D segmentation network - the nnU-Net. Our results showed that modeling long-range dependencies may be helpful in cases where large structures are present and/or large field of view is needed. However, for small structures such as head-and-neck tumor, the convolution-based U-Net architecture seemed to perform well, especially when training dataset is small and computational resource is limited.

translated by 谷歌翻译

DAG: Depth-Aware Guidance with Denoising Diffusion Probabilistic Models

Gyeongnyeon Kim , Wooseok Jang , Gyuseong Lee , Susung Hong , Junyoung Seo , Seungryong Kim

分类：计算机视觉

2022-12-17

In recent years, generative models have undergone significant advancement due to the success of diffusion models. The success of these models is often attributed to their use of guidance techniques, such as classifier and classifier-free methods, which provides effective mechanisms to trade-off between fidelity and diversity. However, these methods are not capable of guiding a generated image to be aware of its geometric configuration, e.g., depth, which hinders the application of diffusion models to areas that require a certain level of depth awareness. To address this limitation, we propose a novel guidance approach for diffusion models that uses estimated depth information derived from the rich intermediate representations of diffusion models. To do this, we first present a label-efficient depth estimation framework using the internal representations of diffusion models. At the sampling phase, we utilize two guidance techniques to self-condition the generated image using the estimated depth map, the first of which uses pseudo-labeling, and the subsequent one uses a depth-domain diffusion prior. Experiments and extensive ablation studies demonstrate the effectiveness of our method in guiding the diffusion models toward geometrically plausible image generation. Project page is available at https://ku-cvlab.github.io/DAG/.

translated by 谷歌翻译

Accurate Open-set Recognition for Memory Workload

Jun-Gi Jang , Sooyeon Shim , Vladimir Egay , Jeeyong Lee , Jongmin Park , Suhyun Chae , U Kang

分类：人工智能

2022-12-17

How can we accurately identify new memory workloads while classifying known memory workloads? Verifying DRAM (Dynamic Random Access Memory) using various workloads is an important task to guarantee the quality of DRAM. A crucial component in the process is open-set recognition which aims to detect new workloads not seen in the training phase. Despite its importance, however, existing open-set recognition methods are unsatisfactory in terms of accuracy since they fail to exploit the characteristics of workload sequences. In this paper, we propose Acorn, an accurate open-set recognition method capturing the characteristics of workload sequences. Acorn extracts two types of feature vectors to capture sequential patterns and spatial locality patterns in memory access. Acorn then uses the feature vectors to accurately classify a subsequence into one of the known classes or identify it as the unknown class. Experiments show that Acorn achieves state-of-the-art accuracy, giving up to 37% points higher unknown class detection accuracy while achieving comparable known class classification accuracy than existing methods.

translated by 谷歌翻译

Can We Find Strong Lottery Tickets in Generative Models?

Sangyeop Yeo , Yoojin Jang , Jy-yong Sohn , Dongyoon Han , Jaejun Yoo

分类：计算机视觉 | 机器学习

2022-12-16

Yes. In this paper, we investigate strong lottery tickets in generative models, the subnetworks that achieve good generative performance without any weight update. Neural network pruning is considered the main cornerstone of model compression for reducing the costs of computation and memory. Unfortunately, pruning a generative model has not been extensively explored, and all existing pruning algorithms suffer from excessive weight-training costs, performance degradation, limited generalizability, or complicated training. To address these problems, we propose to find a strong lottery ticket via moment-matching scores. Our experimental results show that the discovered subnetwork can perform similarly or better than the trained dense model even when only 10% of the weights remain. To the best of our knowledge, we are the first to show the existence of strong lottery tickets in generative models and provide an algorithm to find it stably. Our code and supplementary materials are publicly available.

translated by 谷歌翻译

Significantly improving zero-shot X-ray pathology classification via fine-tuning pre-trained image-text encoders

Jongseong Jang , Daeun Kyung , Seung Hwan Kim , Honglak Lee , Kyunghoon Bae , Edward Choi

分类：机器学习 | 计算机视觉

2022-12-14

Deep neural networks have been successfully adopted to diverse domains including pathology classification based on medical images. However, large-scale and high-quality data to train powerful neural networks are rare in the medical domain as the labeling must be done by qualified experts. Researchers recently tackled this problem with some success by taking advantage of models pre-trained on large-scale general domain data. Specifically, researchers took contrastive image-text encoders (e.g., CLIP) and fine-tuned it with chest X-ray images and paired reports to perform zero-shot pathology classification, thus completely removing the need for pathology-annotated images to train a classification model. Existing studies, however, fine-tuned the pre-trained model with the same contrastive learning objective, and failed to exploit the multi-labeled nature of medical image-report pairs. In this paper, we propose a new fine-tuning strategy based on sentence sampling and positive-pair loss relaxation for improving the downstream zero-shot pathology classification performance, which can be applied to any pre-trained contrastive image-text encoders. Our method consistently showed dramatically improved zero-shot pathology classification performance on four different chest X-ray datasets and 3 different pre-trained models (5.77% average AUROC increase). In particular, fine-tuning CLIP with our method showed much comparable or marginally outperformed to board-certified radiologists (0.619 vs 0.625 in F1 score and 0.530 vs 0.544 in MCC) in zero-shot classification of five prominent diseases from the CheXpert dataset.

translated by 谷歌翻译

NMS Strikes Back

Jeffrey Ouyang-Zhang , Jang Hyun Cho , Xingyi Zhou , Philipp Krähenbühl

分类：计算机视觉

2022-12-12

Detection Transformer (DETR) directly transforms queries to unique objects by using one-to-one bipartite matching during training and enables end-to-end object detection. Recently, these models have surpassed traditional detectors on COCO with undeniable elegance. However, they differ from traditional detectors in multiple designs, including model architecture and training schedules, and thus the effectiveness of one-to-one matching is not fully understood. In this work, we conduct a strict comparison between the one-to-one Hungarian matching in DETRs and the one-to-many label assignments in traditional detectors with non-maximum supervision (NMS). Surprisingly, we observe one-to-many assignments with NMS consistently outperform standard one-to-one matching under the same setting, with a significant gain of up to 2.5 mAP. Our detector that trains Deformable-DETR with traditional IoU-based label assignment achieved 50.2 COCO mAP within 12 epochs (1x schedule) with ResNet50 backbone, outperforming all existing traditional or transformer-based detectors in this setting. On multiple datasets, schedules, and architectures, we consistently show bipartite matching is unnecessary for performant detection transformers. Furthermore, we attribute the success of detection transformers to their expressive transformer architecture. Code is available at https://github.com/jozhang97/DETA.

translated by 谷歌翻译